Salary Prediction Classification - Project TXAI¶
Authors:
- Martín Romero Romero
- Valentina Isabel Ortega Pinto
In [1]:
import os
# Setting Java8 is needed for compatibility with JFML (IEEE standard 1855-2016)
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
!java -version
openjdk version "1.8.0_392" OpenJDK Runtime Environment (build 1.8.0_392-8u392-ga-1~22.04-b08) OpenJDK 64-Bit Server VM (build 25.392-b08, mixed mode)
In [2]:
os.system("pip install simplenlg --quiet")
os.system("pip install tabulate --quiet")
os.system("pip install interpret --quiet")
os.system("pip install xgboost --quiet")
os.system("pip install opencv-python --quiet")
os.system("pip install facets-overview --quiet")
os.system("pip install dice-ml --quiet")
os.system("pip install tensorflow --quiet")
# FAT-forenscics
os.system("pip install fat-forensics --quiet")
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. fairlearn 0.10.0 requires pandas>=2.0.3, but you have pandas 1.5.3 which is incompatible.
Out[2]:
0
In [3]:
import warnings
warnings.filterwarnings('ignore')
import sys
sys.path.append('../')
# Loading plot tool (for ploting fuzzy sets and rules)
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from IPython.display import SVG, HTML, display
import base64
from facets_overview.generic_feature_statistics_generator import GenericFeatureStatisticsGenerator
# Loading csv package for reading data files
import pandas as pd
import seaborn as sns
from tabulate import tabulate
# Loading lib to support handling of confusion matrix
import numpy as np
from numpy.random.mtrand import randint
# Loading lib to support handling of iterators
import itertools
# Loading lib to deal with arff files (Weka format)
from scipy.io.arff import loadarff
# Loading some models from sklearn for performance comparison
import sklearn
from sklearn import preprocessing
from sklearn import tree
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import cross_validate
from sklearn.model_selection import train_test_split
from sklearn.tree import export_text
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
# Loading libs for XAI comparison
from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())
from interpret.glassbox import ExplainableBoostingClassifier
from interpret import show
from numba import *
import xgboost
import shap
from xai.ExplainGAM import ExplainGAMClassifier
import graphviz
import dice_ml
from dice_ml.utils import helpers # helper functions
import cv2 as cv
from examples.common import hog
from xsvmlib.xsvmc import xSVMC
import time
# Loading the Realiser and corpus
from simplenlg.framework import *
from simplenlg.lexicon import *
from simplenlg.realiser.english import *
from simplenlg.phrasespec import *
from simplenlg.features import *
lexicon = Lexicon.getDefaultLexicon()
nlgFactory = NLGFactory(lexicon)
realiser = Realiser(lexicon)
# Loading JFML library
from py4j.java_gateway import JavaGateway
from py4jfml.Py4Jfml import Py4jfml
# opening the JVM and Server for accessing to JFML
gateway = JavaGateway()
# Loading FCFEXPGEN library
from json import load
import re
from copy import deepcopy
from fcfexpgen.feature import Attribute
from fcfexpgen.simple_rule import Simple_Rule
from fcfexpgen.complex_rule import Complex_Rule
from fcfexpgen.factual_explanation import Factual_Explanation
from fcfexpgen.CF_explanation import CF_Explanation
import fatf
import fatf.utils.data.datasets as fatf_datasets
import fatf.utils.data.tools as fatf_data_tools
import fatf.utils.data.density as fatf_density
import fatf.utils.models as fatf_models
import fatf.utils.metrics.tools as fatf_metrics_tools
import fatf.utils.metrics.metrics as fatf_metrics
import fatf.utils.metrics.subgroup_metrics as fatf_smt
import fatf.accountability.models.measures as fatf_accountability_models
import fatf.accountability.data.measures as fatf_accountability_data
import fatf.accountability.data.measures as fatf_dam
24-Mar-08 10:11:33 fatf.utils.array.tools INFO Using numpy's numpy.lib.recfunctions.structured_to_unstructured as fatf.utils.array.tools.structured_to_unstructured and fatf.utils.array.tools.structured_to_unstructured_row.
Data Processing¶
In [4]:
pip install category_encoders
Requirement already satisfied: category_encoders in /opt/conda/lib/python3.11/site-packages (2.6.3) Requirement already satisfied: numpy>=1.14.0 in /opt/conda/lib/python3.11/site-packages (from category_encoders) (1.24.4) Requirement already satisfied: scikit-learn>=0.20.0 in /opt/conda/lib/python3.11/site-packages (from category_encoders) (1.3.1) Requirement already satisfied: scipy>=1.0.0 in /opt/conda/lib/python3.11/site-packages (from category_encoders) (1.11.3) Requirement already satisfied: statsmodels>=0.9.0 in /opt/conda/lib/python3.11/site-packages (from category_encoders) (0.14.0) Requirement already satisfied: pandas>=1.0.5 in /opt/conda/lib/python3.11/site-packages (from category_encoders) (1.5.3) Requirement already satisfied: patsy>=0.5.1 in /opt/conda/lib/python3.11/site-packages (from category_encoders) (0.5.3) Requirement already satisfied: python-dateutil>=2.8.1 in /opt/conda/lib/python3.11/site-packages (from pandas>=1.0.5->category_encoders) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.11/site-packages (from pandas>=1.0.5->category_encoders) (2023.3.post1) Requirement already satisfied: six in /opt/conda/lib/python3.11/site-packages (from patsy>=0.5.1->category_encoders) (1.16.0) Requirement already satisfied: joblib>=1.1.1 in /opt/conda/lib/python3.11/site-packages (from scikit-learn>=0.20.0->category_encoders) (1.3.2) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.11/site-packages (from scikit-learn>=0.20.0->category_encoders) (3.2.0) Requirement already satisfied: packaging>=21.3 in /opt/conda/lib/python3.11/site-packages (from statsmodels>=0.9.0->category_encoders) (23.2) Note: you may need to restart the kernel to use updated packages.
In [5]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import itertools
import scipy.stats as ss
from category_encoders import TargetEncoder
from sklearn.metrics import classification_report
In [6]:
csv_file_path='data/salary.csv'
df = pd.read_csv(csv_file_path)
In [7]:
df
Out[7]:
| age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | salary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39 | State-gov | 77516 | Bachelors | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 2174 | 0 | 40 | United-States | <=50K |
| 1 | 50 | Self-emp-not-inc | 83311 | Bachelors | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 13 | United-States | <=50K |
| 2 | 38 | Private | 215646 | HS-grad | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 0 | 0 | 40 | United-States | <=50K |
| 3 | 53 | Private | 234721 | 11th | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 0 | 0 | 40 | United-States | <=50K |
| 4 | 28 | Private | 338409 | Bachelors | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 0 | 0 | 40 | Cuba | <=50K |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 32556 | 27 | Private | 257302 | Assoc-acdm | 12 | Married-civ-spouse | Tech-support | Wife | White | Female | 0 | 0 | 38 | United-States | <=50K |
| 32557 | 40 | Private | 154374 | HS-grad | 9 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | 0 | 0 | 40 | United-States | >50K |
| 32558 | 58 | Private | 151910 | HS-grad | 9 | Widowed | Adm-clerical | Unmarried | White | Female | 0 | 0 | 40 | United-States | <=50K |
| 32559 | 22 | Private | 201490 | HS-grad | 9 | Never-married | Adm-clerical | Own-child | White | Male | 0 | 0 | 20 | United-States | <=50K |
| 32560 | 52 | Self-emp-inc | 287927 | HS-grad | 9 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 15024 | 0 | 40 | United-States | >50K |
32561 rows × 15 columns
In [8]:
df = df.drop(columns=['capital-gain','capital-loss','education','fnlwgt'])
In [9]:
df['salary'] = df['salary'].str.strip()
df['salary_binary'] = df['salary'].apply(lambda x: 1 if '>50K' in x else 0)
In [10]:
df
Out[10]:
| age | workclass | education-num | marital-status | occupation | relationship | race | sex | hours-per-week | native-country | salary | salary_binary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39 | State-gov | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 40 | United-States | <=50K | 0 |
| 1 | 50 | Self-emp-not-inc | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 13 | United-States | <=50K | 0 |
| 2 | 38 | Private | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 40 | United-States | <=50K | 0 |
| 3 | 53 | Private | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 40 | United-States | <=50K | 0 |
| 4 | 28 | Private | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 40 | Cuba | <=50K | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 32556 | 27 | Private | 12 | Married-civ-spouse | Tech-support | Wife | White | Female | 38 | United-States | <=50K | 0 |
| 32557 | 40 | Private | 9 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | 40 | United-States | >50K | 1 |
| 32558 | 58 | Private | 9 | Widowed | Adm-clerical | Unmarried | White | Female | 40 | United-States | <=50K | 0 |
| 32559 | 22 | Private | 9 | Never-married | Adm-clerical | Own-child | White | Male | 20 | United-States | <=50K | 0 |
| 32560 | 52 | Self-emp-inc | 9 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 40 | United-States | >50K | 1 |
32561 rows × 12 columns
In [11]:
##Correalation Matrix with Numerical Data
df_with_edu = df.drop(['workclass','marital-status', 'occupation','relationship', 'race', 'sex', 'native-country','salary'], axis=1)
corr_matrix = df_with_edu.corr().abs()
# Lets plot only correlation with the class, without considering itself for better localization of the most correlated features
target_corr = corr_matrix.iloc[:-1, -1]
# Creating a df for using seaborn
target_corr_df = target_corr.to_frame().transpose()
plt.figure(figsize=(10, 10))
sns.heatmap(target_corr_df, annot=True, fmt=".2f", cmap='coolwarm', square=True, linewidths=.5, cbar_kws={"orientation": "horizontal"})
plt.title('Correlation with Numeric Data')
plt.show()
In [12]:
df_without_num=df.drop(['age','education-num','hours-per-week', 'salary_binary'],axis=1)
column_combinations = list(itertools.combinations(df_without_num.columns, 2))
# Inicializar un DataFrame para almacenar los coeficientes de contingencia
df_corr = pd.DataFrame(index=df_without_num.columns, columns=df_without_num.columns)
# Calcular el coeficiente de contingencia para cada par de columnas
for col1, col2 in column_combinations:
contingency_matrix = pd.crosstab(df_without_num[col1], df_without_num[col2])
cramers_v = ss.chi2_contingency(contingency_matrix)[0] / (sum(contingency_matrix.shape) - 1)
df_corr.loc[col1, col2] = cramers_v
df_corr.loc[col2, col1] = cramers_v # La matriz es simétrica
# Normalizar los valores entre 0 y 1
df_corr = df_corr.astype(float)
df_corr = (df_corr - df_corr.min().min()) / (df_corr.max().max() - df_corr.min().min())
# Crear un mapa de colores divergentes
cmap = sns.diverging_palette(250, 10, as_cmap=True)
# Mostrar la matriz de correlación con un mapa de colores
plt.figure(figsize=(15, 15))
sns.heatmap(df_corr, annot=True, cmap=cmap, vmin=0, vmax=1, square=True)
plt.title('Contingency Coefficient Matrix (Cramer\'s V)')
plt.show()
In [13]:
target = df['salary_binary']
In [14]:
target
Out[14]:
0 0
1 0
2 0
3 0
4 0
..
32556 0
32557 1
32558 0
32559 0
32560 1
Name: salary_binary, Length: 32561, dtype: int64
In [15]:
df
Out[15]:
| age | workclass | education-num | marital-status | occupation | relationship | race | sex | hours-per-week | native-country | salary | salary_binary | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39 | State-gov | 13 | Never-married | Adm-clerical | Not-in-family | White | Male | 40 | United-States | <=50K | 0 |
| 1 | 50 | Self-emp-not-inc | 13 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 13 | United-States | <=50K | 0 |
| 2 | 38 | Private | 9 | Divorced | Handlers-cleaners | Not-in-family | White | Male | 40 | United-States | <=50K | 0 |
| 3 | 53 | Private | 7 | Married-civ-spouse | Handlers-cleaners | Husband | Black | Male | 40 | United-States | <=50K | 0 |
| 4 | 28 | Private | 13 | Married-civ-spouse | Prof-specialty | Wife | Black | Female | 40 | Cuba | <=50K | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 32556 | 27 | Private | 12 | Married-civ-spouse | Tech-support | Wife | White | Female | 38 | United-States | <=50K | 0 |
| 32557 | 40 | Private | 9 | Married-civ-spouse | Machine-op-inspct | Husband | White | Male | 40 | United-States | >50K | 1 |
| 32558 | 58 | Private | 9 | Widowed | Adm-clerical | Unmarried | White | Female | 40 | United-States | <=50K | 0 |
| 32559 | 22 | Private | 9 | Never-married | Adm-clerical | Own-child | White | Male | 20 | United-States | <=50K | 0 |
| 32560 | 52 | Self-emp-inc | 9 | Married-civ-spouse | Exec-managerial | Wife | White | Female | 40 | United-States | >50K | 1 |
32561 rows × 12 columns
In [16]:
print(df['workclass'].nunique())
print(df['workclass'].unique())
9 [' State-gov' ' Self-emp-not-inc' ' Private' ' Federal-gov' ' Local-gov' ' ?' ' Self-emp-inc' ' Without-pay' ' Never-worked']
In [17]:
print(df['marital-status'].nunique())
print(df['marital-status'].unique())
7 [' Never-married' ' Married-civ-spouse' ' Divorced' ' Married-spouse-absent' ' Separated' ' Married-AF-spouse' ' Widowed']
In [18]:
print(df['occupation'].nunique())
print(df['occupation'].unique())
15 [' Adm-clerical' ' Exec-managerial' ' Handlers-cleaners' ' Prof-specialty' ' Other-service' ' Sales' ' Craft-repair' ' Transport-moving' ' Farming-fishing' ' Machine-op-inspct' ' Tech-support' ' ?' ' Protective-serv' ' Armed-Forces' ' Priv-house-serv']
In [19]:
print(df['relationship'].nunique())
print(df['relationship'].unique())
6 [' Not-in-family' ' Husband' ' Wife' ' Own-child' ' Unmarried' ' Other-relative']
In [20]:
print(df['race'].nunique())
print(df['race'].unique())
5 [' White' ' Black' ' Asian-Pac-Islander' ' Amer-Indian-Eskimo' ' Other']
In [21]:
print(df['sex'].nunique())
print(df['sex'].unique())
2 [' Male' ' Female']
In [22]:
print(df['native-country'].nunique())
print(df['native-country'].unique())
42 [' United-States' ' Cuba' ' Jamaica' ' India' ' ?' ' Mexico' ' South' ' Puerto-Rico' ' Honduras' ' England' ' Canada' ' Germany' ' Iran' ' Philippines' ' Italy' ' Poland' ' Columbia' ' Cambodia' ' Thailand' ' Ecuador' ' Laos' ' Taiwan' ' Haiti' ' Portugal' ' Dominican-Republic' ' El-Salvador' ' France' ' Guatemala' ' China' ' Japan' ' Yugoslavia' ' Peru' ' Outlying-US(Guam-USVI-etc)' ' Scotland' ' Trinadad&Tobago' ' Greece' ' Nicaragua' ' Vietnam' ' Hong' ' Ireland' ' Hungary' ' Holand-Netherlands']
Target encoding¶
In [23]:
encoder = TargetEncoder()
# Fit the encoder on the 'native-country' column
encoder.fit(df['native-country'], target)
# Transform the 'native-country' column
df['native-country_encoded'] = encoder.transform(df['native-country'])
# Drop the original 'native-country' column if needed
df.drop(columns=['native-country'], inplace=True)
In [24]:
encoder = TargetEncoder()
# Fit the encoder on the 'native-country' column
encoder.fit(df['workclass'], target)
# Transform the 'native-country' column
df['workclass_encoded'] = encoder.transform(df['workclass'])
# Drop the original 'native-country' column if needed
df.drop(columns=['workclass'], inplace=True)
In [25]:
encoder = TargetEncoder()
# Fit the encoder on the 'native-country' column
encoder.fit(df['marital-status'], target)
# Transform the 'native-country' column
df['marital-status_encoded'] = encoder.transform(df['marital-status'])
# Drop the original 'native-country' column if needed
df.drop(columns=['marital-status'], inplace=True)
In [26]:
encoder = TargetEncoder()
# Fit the encoder on the 'native-country' column
encoder.fit(df['occupation'], target)
# Transform the 'native-country' column
df['occupations_encoded'] = encoder.transform(df['occupation'])
# Drop the original 'native-country' column if needed
df.drop(columns=['occupation'], inplace=True)
In [27]:
encoder = TargetEncoder()
# Fit the encoder on the 'native-country' column
encoder.fit(df['relationship'], target)
# Transform the 'native-country' column
df['relationship_encoded'] = encoder.transform(df['relationship'])
# Drop the original 'native-country' column if needed
df.drop(columns=['relationship'], inplace=True)
In [28]:
encoder = TargetEncoder()
# Fit the encoder on the 'native-country' column
encoder.fit(df['race'], target)
# Transform the 'native-country' column
df['race_encoded'] = encoder.transform(df['race'])
# Drop the original 'native-country' column if needed
df.drop(columns=['race'], inplace=True)
In [29]:
df['sex_encoded'] = df['sex'].apply(lambda x: 1 if 'Male' in x else 0)
df.drop(columns=['sex'],inplace=True)
In [30]:
df = df.drop(columns=['salary'])
In [31]:
df
Out[31]:
| age | education-num | hours-per-week | salary_binary | native-country_encoded | workclass_encoded | marital-status_encoded | occupations_encoded | relationship_encoded | race_encoded | sex_encoded | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39 | 13 | 40 | 0 | 0.245835 | 0.271957 | 0.045961 | 0.134483 | 0.103070 | 0.25586 | 1 |
| 1 | 50 | 13 | 13 | 0 | 0.245835 | 0.284927 | 0.446848 | 0.484014 | 0.448571 | 0.25586 | 1 |
| 2 | 38 | 9 | 40 | 0 | 0.245835 | 0.218673 | 0.104209 | 0.062774 | 0.103070 | 0.25586 | 1 |
| 3 | 53 | 7 | 40 | 0 | 0.245835 | 0.218673 | 0.446848 | 0.062774 | 0.448571 | 0.12388 | 1 |
| 4 | 28 | 13 | 40 | 0 | 0.263146 | 0.218673 | 0.446848 | 0.449034 | 0.475128 | 0.12388 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 32556 | 27 | 12 | 38 | 0 | 0.245835 | 0.218673 | 0.446848 | 0.304957 | 0.475128 | 0.25586 | 0 |
| 32557 | 40 | 9 | 40 | 1 | 0.245835 | 0.218673 | 0.446848 | 0.124875 | 0.448571 | 0.25586 | 1 |
| 32558 | 58 | 9 | 40 | 0 | 0.245835 | 0.218673 | 0.085599 | 0.134483 | 0.063262 | 0.25586 | 0 |
| 32559 | 22 | 9 | 20 | 0 | 0.245835 | 0.218673 | 0.045961 | 0.134483 | 0.013220 | 0.25586 | 1 |
| 32560 | 52 | 9 | 40 | 1 | 0.245835 | 0.557348 | 0.446848 | 0.484014 | 0.475128 | 0.25586 | 0 |
32561 rows × 11 columns
In [32]:
# Get a list of column names excluding the target column
columns = [col for col in df.columns if col != 'salary_binary']
# Append the target column at the end of the list
columns.append('salary_binary')
# Reindex the DataFrame with the new column order
df = df.reindex(columns=columns)
In [33]:
df
Out[33]:
| age | education-num | hours-per-week | native-country_encoded | workclass_encoded | marital-status_encoded | occupations_encoded | relationship_encoded | race_encoded | sex_encoded | salary_binary | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39 | 13 | 40 | 0.245835 | 0.271957 | 0.045961 | 0.134483 | 0.103070 | 0.25586 | 1 | 0 |
| 1 | 50 | 13 | 13 | 0.245835 | 0.284927 | 0.446848 | 0.484014 | 0.448571 | 0.25586 | 1 | 0 |
| 2 | 38 | 9 | 40 | 0.245835 | 0.218673 | 0.104209 | 0.062774 | 0.103070 | 0.25586 | 1 | 0 |
| 3 | 53 | 7 | 40 | 0.245835 | 0.218673 | 0.446848 | 0.062774 | 0.448571 | 0.12388 | 1 | 0 |
| 4 | 28 | 13 | 40 | 0.263146 | 0.218673 | 0.446848 | 0.449034 | 0.475128 | 0.12388 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 32556 | 27 | 12 | 38 | 0.245835 | 0.218673 | 0.446848 | 0.304957 | 0.475128 | 0.25586 | 0 | 0 |
| 32557 | 40 | 9 | 40 | 0.245835 | 0.218673 | 0.446848 | 0.124875 | 0.448571 | 0.25586 | 1 | 1 |
| 32558 | 58 | 9 | 40 | 0.245835 | 0.218673 | 0.085599 | 0.134483 | 0.063262 | 0.25586 | 0 | 0 |
| 32559 | 22 | 9 | 20 | 0.245835 | 0.218673 | 0.045961 | 0.134483 | 0.013220 | 0.25586 | 1 | 0 |
| 32560 | 52 | 9 | 40 | 0.245835 | 0.557348 | 0.446848 | 0.484014 | 0.475128 | 0.25586 | 0 | 1 |
32561 rows × 11 columns
In [34]:
target
Out[34]:
0 0
1 0
2 0
3 0
4 0
..
32556 0
32557 1
32558 0
32559 0
32560 1
Name: salary_binary, Length: 32561, dtype: int64
Correlation Matrix for encoded data¶
In [35]:
##Correalation Matrix with Numerical Data
corr_matrix = df.corr().abs()
# Lets plot only correlation with the class, without considering itself for better localization of the most correlated features
target_corr = corr_matrix.iloc[:-1, -1]
# Creating a df for using seaborn
target_corr_df = target_corr.to_frame().transpose()
plt.figure(figsize=(15, 15))
sns.heatmap(target_corr_df, annot=True, fmt=".2f", cmap='coolwarm', square=True, linewidths=.5, cbar_kws={"orientation": "horizontal"})
plt.title('Correlation with Encoded Data')
plt.show()
In [36]:
df = df.drop(columns=['salary_binary'])
In [37]:
df
Out[37]:
| age | education-num | hours-per-week | native-country_encoded | workclass_encoded | marital-status_encoded | occupations_encoded | relationship_encoded | race_encoded | sex_encoded | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39 | 13 | 40 | 0.245835 | 0.271957 | 0.045961 | 0.134483 | 0.103070 | 0.25586 | 1 |
| 1 | 50 | 13 | 13 | 0.245835 | 0.284927 | 0.446848 | 0.484014 | 0.448571 | 0.25586 | 1 |
| 2 | 38 | 9 | 40 | 0.245835 | 0.218673 | 0.104209 | 0.062774 | 0.103070 | 0.25586 | 1 |
| 3 | 53 | 7 | 40 | 0.245835 | 0.218673 | 0.446848 | 0.062774 | 0.448571 | 0.12388 | 1 |
| 4 | 28 | 13 | 40 | 0.263146 | 0.218673 | 0.446848 | 0.449034 | 0.475128 | 0.12388 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 32556 | 27 | 12 | 38 | 0.245835 | 0.218673 | 0.446848 | 0.304957 | 0.475128 | 0.25586 | 0 |
| 32557 | 40 | 9 | 40 | 0.245835 | 0.218673 | 0.446848 | 0.124875 | 0.448571 | 0.25586 | 1 |
| 32558 | 58 | 9 | 40 | 0.245835 | 0.218673 | 0.085599 | 0.134483 | 0.063262 | 0.25586 | 0 |
| 32559 | 22 | 9 | 20 | 0.245835 | 0.218673 | 0.045961 | 0.134483 | 0.013220 | 0.25586 | 1 |
| 32560 | 52 | 9 | 40 | 0.245835 | 0.557348 | 0.446848 | 0.484014 | 0.475128 | 0.25586 | 0 |
32561 rows × 10 columns
In [38]:
feature_names_array = df.columns.values
print(feature_names_array)
['age' 'education-num' 'hours-per-week' 'native-country_encoded' 'workclass_encoded' 'marital-status_encoded' 'occupations_encoded' 'relationship_encoded' 'race_encoded' 'sex_encoded']
In [39]:
target
Out[39]:
0 0
1 0
2 0
3 0
4 0
..
32556 0
32557 1
32558 0
32559 0
32560 1
Name: salary_binary, Length: 32561, dtype: int64
Splitting data for train and test¶
In [40]:
x_train, x_test, y_train, y_test = train_test_split(df, target,test_size=0.1, random_state=42)
Explainability¶
1. Creation of models¶
In [41]:
## this code is developed by Jose Maria Alonso-Moral
x_tr = x_train
y_tr = y_train
xgb = xgboost.XGBRegressor().fit(x_tr, y_tr)
dtc = tree.DecisionTreeClassifier()
dtc.fit(x_tr, y_tr)
dtc5 = tree.DecisionTreeClassifier(max_depth=5)
dtc5.fit(x_tr, y_tr)
gam = ExplainGAMClassifier()
gam.fit(x_tr, y_tr)
ebm = ExplainableBoostingClassifier()
ebm.fit(x_tr,y_tr)
randf = RandomForestClassifier(n_estimators=500)
randf.fit(x_tr, y_tr)
models = [ebm, gam, dtc, dtc5, randf]
models_names = ['EBM', 'GAM', 'TREE', 'TREE5', 'RF']
# %% cross-validation
print("1) cross-validation (over training data)")
scorings = ['accuracy', 'f1'] # For binary classification
i = 0
nF= 5
for mm in models:
cv_results = cross_validate(mm, x_tr, y_tr, cv=nF,
scoring=scorings,
return_train_score=False)
print()
print(models_names[i] + ':')
i = i+1
am1=np.mean(cv_results['test_accuracy'])
am1s=np.std(cv_results['test_accuracy'])
am2=np.mean(cv_results['test_f1'])
am2s=np.std(cv_results['test_f1'])
print(' Average Correct Classification Rate = ' + str('%0.3f' %am1))
print(' Stdev Correct Classification Rate = ' + str('%0.3f' %am1s))
print(' Average F-score = ' + str('%0.3f' %am2))
print(' Stdev F-score = ' + str('%0.3f' %am2s))
# %% test with unknown data
print()
print("2) test with unknown data")
i=0
models_acc=[]
target_names = ['Lower than 50k', 'Higher than 50k']
for mm in models:
print()
print(models_names[i] + ':')
i = i+1
sc= mm.score(x_test,y_test)
# Mean accuracy of self.predict(x_test) wrt y_test
print("Correct Classifiction Rate: "+str('%0.3f' %sc))
models_acc.append(round(sc,3))
y_pred= mm.predict(x_test)
print(classification_report(y_test, y_pred, target_names=target_names))
1) cross-validation (over training data)
EBM:
Average Correct Classification Rate = 0.841
Stdev Correct Classification Rate = 0.003
Average F-score = 0.638
Stdev F-score = 0.008
GAM:
Average Correct Classification Rate = 0.840
Stdev Correct Classification Rate = 0.002
Average F-score = 0.628
Stdev F-score = 0.006
TREE:
Average Correct Classification Rate = 0.781
Stdev Correct Classification Rate = 0.005
Average F-score = 0.539
Stdev F-score = 0.012
TREE5:
Average Correct Classification Rate = 0.830
Stdev Correct Classification Rate = 0.003
Average F-score = 0.597
Stdev F-score = 0.021
RF:
Average Correct Classification Rate = 0.823
Stdev Correct Classification Rate = 0.006
Average F-score = 0.607
Stdev F-score = 0.012
2) test with unknown data
EBM:
Correct Classifiction Rate: 0.847
precision recall f1-score support
Lower than 50k 0.87 0.93 0.90 2456
Higher than 50k 0.74 0.58 0.65 801
accuracy 0.85 3257
macro avg 0.81 0.76 0.78 3257
weighted avg 0.84 0.85 0.84 3257
GAM:
Correct Classifiction Rate: 0.845
precision recall f1-score support
Lower than 50k 0.87 0.93 0.90 2456
Higher than 50k 0.74 0.57 0.64 801
accuracy 0.84 3257
macro avg 0.80 0.75 0.77 3257
weighted avg 0.84 0.84 0.84 3257
TREE:
Correct Classifiction Rate: 0.784
precision recall f1-score support
Lower than 50k 0.85 0.86 0.86 2456
Higher than 50k 0.56 0.54 0.55 801
accuracy 0.78 3257
macro avg 0.71 0.70 0.70 3257
weighted avg 0.78 0.78 0.78 3257
TREE5:
Correct Classifiction Rate: 0.833
precision recall f1-score support
Lower than 50k 0.85 0.95 0.90 2456
Higher than 50k 0.75 0.49 0.59 801
accuracy 0.83 3257
macro avg 0.80 0.72 0.74 3257
weighted avg 0.82 0.83 0.82 3257
RF:
Correct Classifiction Rate: 0.827
precision recall f1-score support
Lower than 50k 0.87 0.90 0.89 2456
Higher than 50k 0.67 0.59 0.63 801
accuracy 0.83 3257
macro avg 0.77 0.75 0.76 3257
weighted avg 0.82 0.83 0.82 3257
2. InterpretML for EBM¶
In [42]:
# explanations provided by interpretML
#ebm.fit(x_tr, y_tr) # execute this only once
ebm_global = ebm.explain_global()
show(ebm_global)
In [43]:
salary_ind = [1,2]
for i in salary_ind:
idx_instance = i # Explain the classification with instance (x[idx])
ebm_local = ebm.explain_local(x_tr.iloc[idx_instance-1 : idx_instance],y_tr.iloc[idx_instance-1 : idx_instance])
show(ebm_local)
3. SHAP Values for XGB¶
In [44]:
# Explaining with SHAP
explainer = shap.Explainer(xgb)
shap_values = explainer(x_tr)
# visualize the global prediction's explanation
shap.plots.bar(shap_values)
In [45]:
# visualizing SHAP explanations for single instances
for i in salary_ind:
shap.plots.waterfall(shap_values[i])
4. Visualizing trees¶
In [46]:
# Visualizing entire decision tree
print(export_text(dtc5, feature_names=feature_names_array))
dot_data = tree.export_graphviz(dtc5, out_file=None, filled=True, rounded=True, special_characters=True, feature_names=feature_names_array)
graph = graphviz.Source(dot_data)
graph
|--- relationship_encoded <= 0.28 | |--- education-num <= 13.50 | | |--- hours-per-week <= 42.50 | | | |--- occupations_encoded <= 0.29 | | | | |--- education-num <= 12.50 | | | | | |--- class: 0 | | | | |--- education-num > 12.50 | | | | | |--- class: 0 | | | |--- occupations_encoded > 0.29 | | | | |--- age <= 34.50 | | | | | |--- class: 0 | | | | |--- age > 34.50 | | | | | |--- class: 0 | | |--- hours-per-week > 42.50 | | | |--- education-num <= 12.50 | | | | |--- age <= 38.50 | | | | | |--- class: 0 | | | | |--- age > 38.50 | | | | | |--- class: 0 | | | |--- education-num > 12.50 | | | | |--- age <= 27.50 | | | | | |--- class: 0 | | | | |--- age > 27.50 | | | | | |--- class: 0 | |--- education-num > 13.50 | | |--- hours-per-week <= 43.50 | | | |--- education-num <= 14.50 | | | | |--- age <= 47.50 | | | | | |--- class: 0 | | | | |--- age > 47.50 | | | | | |--- class: 0 | | | |--- education-num > 14.50 | | | | |--- age <= 31.50 | | | | | |--- class: 0 | | | | |--- age > 31.50 | | | | | |--- class: 0 | | |--- hours-per-week > 43.50 | | | |--- age <= 30.50 | | | | |--- marital-status_encoded <= 0.06 | | | | | |--- class: 0 | | | | |--- marital-status_encoded > 0.06 | | | | | |--- class: 1 | | | |--- age > 30.50 | | | | |--- occupations_encoded <= 0.47 | | | | | |--- class: 0 | | | | |--- occupations_encoded > 0.47 | | | | | |--- class: 1 |--- relationship_encoded > 0.28 | |--- education-num <= 12.50 | | |--- occupations_encoded <= 0.25 | | | |--- education-num <= 8.50 | | | | |--- occupations_encoded <= 0.17 | | | | | |--- class: 0 | | | | |--- occupations_encoded > 0.17 | | | | | |--- class: 0 | | | |--- education-num > 8.50 | | | | |--- age <= 35.50 | | | | | |--- class: 0 | | | | |--- age > 35.50 | | | | | |--- class: 0 | | |--- occupations_encoded > 0.25 | | | |--- age <= 32.50 | | | | |--- age <= 26.50 | | | | | |--- class: 0 | | | | |--- age > 26.50 | | | | | |--- class: 0 | | | |--- age > 32.50 | | | | |--- education-num <= 9.50 | | | | | |--- class: 0 | | | | |--- education-num > 9.50 | | | | | |--- class: 1 | |--- education-num > 12.50 | | |--- occupations_encoded <= 0.25 | | | |--- hours-per-week <= 41.50 | | | | |--- occupations_encoded <= 0.08 | | | | | |--- class: 0 | | | | |--- occupations_encoded > 0.08 | | | | | |--- class: 0 | | | |--- hours-per-week > 41.50 | | | | |--- workclass_encoded <= 0.28 | | | | | |--- class: 1 | | | | |--- workclass_encoded > 0.28 | | | | | |--- class: 0 | | |--- occupations_encoded > 0.25 | | | |--- age <= 28.50 | | | | |--- age <= 25.50 | | | | | |--- class: 0 | | | | |--- age > 25.50 | | | | | |--- class: 1 | | | |--- age > 28.50 | | | | |--- hours-per-week <= 31.00 | | | | | |--- class: 1 | | | | |--- hours-per-week > 31.00 | | | | | |--- class: 1
Out[46]:
5. Explanations for GAM model¶
In [47]:
# %% Plot global factual explanation
gam.explain_global_importances(x_tr,
figsize=(20, 5))
# %% Plot global counterfactual explanation
gam.plot_counterfactual_importance(x_tr,
weighted=True, how_many_plots=2,
figsize=(21, 7)
)
100%|███████████████████████████████████████████| 10/10 [06:44<00:00, 40.44s/it]
6. Accuracy vs Explainability Tradeoff¶
In [48]:
def plot_pareto_front(x,y,n,labelx,labely,minx,maxx):
plt.title("Pareto Front")
plt.ylabel(labelx)
plt.xlabel(labely)
plt.axis([minx, maxx, 0, 1])
#plt.plot(x, y, 'ro')
c=["ro","bo","go","rs","bs","gs","r*","b*","g*","r+","b+","g+"]
for m in n:
m_idx= n.index(m)
plt.plot(x[m_idx], y[m_idx], c[m_idx], label=m)
#plt.annotate(m, xy=(x[m_idx], y[m_idx]), xytext=(x[m_idx]+0.1, y[m_idx]-0.1), arrowprops=dict(facecolor='black', shrink=0.05), )
plt.grid(True)
plt.legend()
plt.show()
In [49]:
# Building and visualizing the Pareto front with Accuracy versus Number of leaves
lim= 175
m_int=[dtc.get_n_leaves(), dtc5.get_n_leaves(), lim]
# we set RF in the limit (lim) because it is a black-box model (no easy fair way to compute its complexity in terms of rules/leaves)
# dtc is the full tree
# dtc5 has max tree depth equals 5
m_acc=[models_acc[2], models_acc[3], models_acc[4]]
m_nam= [models_names[2], models_names[3], models_names[4]]
print(m_int)
print(m_acc)
print(m_nam)
plt.figure(figsize=[15,10])
plot_pareto_front(m_int,m_acc,m_nam,'Accuracy (Classification Ratio)','Interpretability (Num of rules/leaves)',20,lim)
[6213, 32, 175] [0.784, 0.833, 0.827] ['TREE', 'TREE5', 'RF']
In [50]:
def get_shap_explanation_length(single_lower_triangular_interactions, indexes=None, th=0.9):
# Calculate the cumulative sum and absolute value using numpy functions
shap_cumsum = np.cumsum(np.abs(single_lower_triangular_interactions))
# Normalize the cumulative sum
normalised_shap_cumsum = shap_cumsum / shap_cumsum[-1]
# Find the index of the first element that exceeds the threshold using a loop
first_above_idx = 0
for i, val in enumerate(normalised_shap_cumsum):
if val > th:
first_above_idx = i
break
# Calculate the SHAP explanation length
shap_expl_length = first_above_idx
return shap_expl_length
In [51]:
SL=[]
explainerTree = shap.TreeExplainer(dtc)
shap_values_DTC = explainerTree.shap_values(x_test)
SL.append(get_shap_explanation_length(shap_values_DTC[0]))
explainerTree5 = shap.TreeExplainer(dtc5)
shap_values_DTC5 = explainerTree5.shap_values(x_test)
SL.append(get_shap_explanation_length(shap_values_DTC5[0]))
explainerTreeRF = shap.TreeExplainer(randf)
shap_values_RF = explainerTreeRF.shap_values(x_test)
SL.append(get_shap_explanation_length(shap_values_RF[0]))
print(SL)
[29360, 29360, 29341]
In [52]:
m_int=SL
m_acc=[models_acc[2], models_acc[3], models_acc[4]]
m_nam= [models_names[2], models_names[3], models_names[4]]
print(SL)
print(m_acc)
print(m_nam)
plt.figure(figsize=[15,10])
plot_pareto_front(m_int,m_acc,m_nam,'Accuracy (Classification Ratio)','Interpretability (Shap Length)',550,565)
[29360, 29360, 29341] [0.784, 0.833, 0.827] ['TREE', 'TREE5', 'RF']
Finding bias¶
In [53]:
df
Out[53]:
| age | education-num | hours-per-week | native-country_encoded | workclass_encoded | marital-status_encoded | occupations_encoded | relationship_encoded | race_encoded | sex_encoded | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39 | 13 | 40 | 0.245835 | 0.271957 | 0.045961 | 0.134483 | 0.103070 | 0.25586 | 1 |
| 1 | 50 | 13 | 13 | 0.245835 | 0.284927 | 0.446848 | 0.484014 | 0.448571 | 0.25586 | 1 |
| 2 | 38 | 9 | 40 | 0.245835 | 0.218673 | 0.104209 | 0.062774 | 0.103070 | 0.25586 | 1 |
| 3 | 53 | 7 | 40 | 0.245835 | 0.218673 | 0.446848 | 0.062774 | 0.448571 | 0.12388 | 1 |
| 4 | 28 | 13 | 40 | 0.263146 | 0.218673 | 0.446848 | 0.449034 | 0.475128 | 0.12388 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 32556 | 27 | 12 | 38 | 0.245835 | 0.218673 | 0.446848 | 0.304957 | 0.475128 | 0.25586 | 0 |
| 32557 | 40 | 9 | 40 | 0.245835 | 0.218673 | 0.446848 | 0.124875 | 0.448571 | 0.25586 | 1 |
| 32558 | 58 | 9 | 40 | 0.245835 | 0.218673 | 0.085599 | 0.134483 | 0.063262 | 0.25586 | 0 |
| 32559 | 22 | 9 | 20 | 0.245835 | 0.218673 | 0.045961 | 0.134483 | 0.013220 | 0.25586 | 1 |
| 32560 | 52 | 9 | 40 | 0.245835 | 0.557348 | 0.446848 | 0.484014 | 0.475128 | 0.25586 | 0 |
32561 rows × 10 columns
In [54]:
df_X = df
df_y = target.astype(int)
df_feature_names = df_X.columns.tolist()
df_class_names = target.unique()
df_X=df_X.to_numpy()
df_y=df_y.to_numpy()
df_feature_names=np.array(df_feature_names)
df_class_names=np.array(df_class_names)
print(df_X)
print(df_feature_names)
print(df_class_names)
[[3.90000000e+01 1.30000000e+01 4.00000000e+01 ... 1.03070439e-01 2.55859937e-01 1.00000000e+00] [5.00000000e+01 1.30000000e+01 1.30000000e+01 ... 4.48571212e-01 2.55859937e-01 1.00000000e+00] [3.80000000e+01 9.00000000e+00 4.00000000e+01 ... 1.03070439e-01 2.55859937e-01 1.00000000e+00] ... [5.80000000e+01 9.00000000e+00 4.00000000e+01 ... 6.32617528e-02 2.55859937e-01 0.00000000e+00] [2.20000000e+01 9.00000000e+00 2.00000000e+01 ... 1.32202052e-02 2.55859937e-01 1.00000000e+00] [5.20000000e+01 9.00000000e+00 4.00000000e+01 ... 4.75127551e-01 2.55859937e-01 0.00000000e+00]] ['age' 'education-num' 'hours-per-week' 'native-country_encoded' 'workclass_encoded' 'marital-status_encoded' 'occupations_encoded' 'relationship_encoded' 'race_encoded' 'sex_encoded'] [0 1]
Sampling bias and systematic performance bias with race¶
In [55]:
print(df['race_encoded'].max())
print(df['race_encoded'].min())
print(df['race_encoded'].unique())
0.26564003849855633 0.09225092251109193 [0.25585994 0.12387964 0.26564004 0.11575563 0.09225092]
In [56]:
# Select a feature for which the Sampling Bias be measured
selected_feature_index = 8
selected_feature_name = df_feature_names[selected_feature_index]
# Define grouping on the selected feature
selected_feature_groups = [0.10, 0.12,0.14,0.26]
In [57]:
selected_feature_grouping = fatf_data_tools.group_by_column(
df_X,
selected_feature_index,
groupings=selected_feature_groups)
selected_feature_grouping[1]
Out[57]:
['x <= 0.1', '0.1 < x <= 0.12', '0.12 < x <= 0.14', '0.14 < x <= 0.26', '0.26 < x']
In [58]:
print(len(selected_feature_grouping[0][0]))
print(len(selected_feature_grouping[0][1]))
print(len(selected_feature_grouping[0][2]))
print(len(selected_feature_grouping[0][3]))
print(len(selected_feature_grouping[0][4]))
271 311 3124 27816 1039
In [59]:
counts_per_grouping = [len(i) for i in selected_feature_grouping[0]]
print(counts_per_grouping)
print(counts_per_grouping)
fatf_accountability_data.sampling_bias_grid_check(counts_per_grouping)
[271, 311, 3124, 27816, 1039] [271, 311, 3124, 27816, 1039]
Out[59]:
array([[False, False, True, True, True],
[False, False, True, True, True],
[ True, True, False, True, True],
[ True, True, True, False, True],
[ True, True, True, True, False]])
In [60]:
# Get the disparity grid
bias_grid = fatf_dam.sampling_bias_grid_check(counts_per_grouping)
print(bias_grid)
# Print out disparity per every grouping pair
print('\nThe Sampling Bias for *{}* feature (index {}) grouping is:'
''.format(selected_feature_name, selected_feature_index))
for grouping_i, grouping_name_i in enumerate(counts_per_grouping):
j_offset = grouping_i + 1
for grouping_j, grouping_name_j in enumerate(counts_per_grouping[j_offset:]):
grouping_j += j_offset
is_not = '' if bias_grid[grouping_i, grouping_j] else ' NO'
print(' * For "{}" and "{}" groupings there is{} Sampling Bias.'
''.format(grouping_name_i, grouping_name_j, is_not))
[[False False True True True]
[False False True True True]
[ True True False True True]
[ True True True False True]
[ True True True True False]]
The Sampling Bias for *race_encoded* feature (index 8) grouping is:
* For "271" and "311" groupings there is NO Sampling Bias.
* For "271" and "3124" groupings there is Sampling Bias.
* For "271" and "27816" groupings there is Sampling Bias.
* For "271" and "1039" groupings there is Sampling Bias.
* For "311" and "3124" groupings there is Sampling Bias.
* For "311" and "27816" groupings there is Sampling Bias.
* For "311" and "1039" groupings there is Sampling Bias.
* For "3124" and "27816" groupings there is Sampling Bias.
* For "3124" and "1039" groupings there is Sampling Bias.
* For "27816" and "1039" groupings there is Sampling Bias.
In [61]:
dtc5 = tree.DecisionTreeClassifier(max_depth=5)
dtc5.fit(df_X, df_y)
df_pred = dtc5.predict(df_X)
In [62]:
grouping_cm = fatf_metrics_tools.confusion_matrix_per_subgroup_indexed(
selected_feature_grouping[0],
df_y,
df_pred,
labels=np.unique(df_y).tolist())
In [63]:
print('First subgroup')
print('Targets: ', df_y[selected_feature_grouping[0][0]])
print('Unique targets: ', np.unique(df_y[selected_feature_grouping[0][0]]))
print('Unique predictions: ', np.unique(df_pred[selected_feature_grouping[0][0]]))
print('Confusion matrix: ')
print(grouping_cm[0])
First subgroup Targets: [0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0] Unique targets: [0 1] Unique predictions: [0 1] Confusion matrix: [[236 15] [ 10 10]]
In [64]:
print('Second subgroup')
#print(np.unique(iris_y[selected_feature_grouping[0][1]]))
#print(np.unique(iris_pred[selected_feature_grouping[0][1]]))
print('Confusion matrix: ')
print(grouping_cm[1])
print('Third subgroup')
#print(np.unique(iris_y[selected_feature_grouping[0][2]]))
#print(np.unique(iris_pred[selected_feature_grouping[0][2]]))
print('Confusion matrix: ')
print(grouping_cm[2])
print('Fourth subgroup')
#print(np.unique(iris_y[selected_feature_grouping[0][2]]))
#print(np.unique(iris_pred[selected_feature_grouping[0][2]]))
print('Confusion matrix: ')
print(grouping_cm[3])
print('Fifth subgroup')
#print(np.unique(iris_y[selected_feature_grouping[0][2]]))
#print(np.unique(iris_pred[selected_feature_grouping[0][2]]))
print('Confusion matrix: ')
print(grouping_cm[4])
Second subgroup Confusion matrix: [[265 26] [ 10 10]] Third subgroup Confusion matrix: [[2677 251] [ 60 136]] Fourth subgroup Confusion matrix: [[19411 3541] [ 1288 3576]] Fifth subgroup Confusion matrix: [[687 131] [ 76 145]]
In [65]:
group_0_acc = fatf_metrics.accuracy(grouping_cm[0])
print(group_0_acc)
group_1_acc = fatf_metrics.accuracy(grouping_cm[1])
print(group_1_acc)
group_2_acc = fatf_metrics.accuracy(grouping_cm[2])
print(group_2_acc)
group_3_acc = fatf_metrics.accuracy(grouping_cm[3])
print(group_3_acc)
group_4_acc = fatf_metrics.accuracy(grouping_cm[4])
print(group_4_acc)
0.9077490774907749 0.8842443729903537 0.9004481434058899 0.8263948806442335 0.8007699711260827
In [66]:
fatf_accountability_models.systematic_performance_bias_grid([group_0_acc, group_1_acc, group_2_acc,group_3_acc,group_4_acc])
Out[66]:
array([[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, False, False, False]])
In [67]:
# Select a predictive performance metric
predictive_performance_metric = 'accuracy'
#predictive_performance_metric = 'positive predictive value' # Notice a class index needs to be selected using "label_index" property
#predictive_performance_metric = 'false negative rate' # Notice a class index needs to be selected using "label_index" property
# Select a feature for which the difference in performance should be measured
selected_feature_index = 8 # before was 2, in FAT-forensics example was 1
selected_feature_name = df_feature_names[selected_feature_index]
# Define grouping on the selected feature
selected_feature_groups = [0.10, 0.12,0.14,0.26] # before was [2.5, 4.75], , in FAT-forensics example was [3]
selected_feature_grouping = fatf_data_tools.group_by_column(
df_X,
selected_feature_index,
groupings=selected_feature_groups)
print('Grouping using variable', df_feature_names[selected_feature_index], ' as ', selected_feature_grouping[1])
grouping_cm = fatf_metrics_tools.confusion_matrix_per_subgroup_indexed(
selected_feature_grouping[0],
df_y,
df_pred,
labels=np.unique(df_y).tolist())
print('Resulting confusion matrices:')
for i in range(len(grouping_cm)):
print('Confusion matrix', i, ':')
print(grouping_cm[i])
# Compute performance per group in the feature
population_metrics, population_names = fatf_smt.performance_per_subgroup(
df_X,
df_y,
df_pred,
selected_feature_index,
groupings=selected_feature_groups,
metric=predictive_performance_metric,
label_index = 2)
#print(population_metrics)
# Print out performance per grouping
print('The *{}* for groups defined on "{}" feature (feature index '
'{}):'.format(predictive_performance_metric, selected_feature_name,
selected_feature_index))
for p_name, p_metric in zip(population_names, population_metrics):
print(' * For the population split *{}* the {} is: '
'{:.2f}.'.format(p_name, predictive_performance_metric, p_metric))
# Evaluate Systematic Performance Bias
bias_grid = fatf_accountability_models.systematic_performance_bias_grid(population_metrics)
print('Systematic performance grid check:')
print(bias_grid)
# Print out Systematic Performance Bias for each grouping pair
print('\nThe *{}-based* Systematic Performance Bias for *{}* feature (index '
'{}) grouping is:'.format(predictive_performance_metric,
selected_feature_name, selected_feature_index))
for grouping_i, grouping_name_i in enumerate(population_names):
j_offset = grouping_i + 1
for grouping_j, grouping_name_j in enumerate(population_names[j_offset:]):
grouping_j += j_offset
is_not = '' if bias_grid[grouping_i, grouping_j] else ' NO'
print(' * For "{}" and "{}" groupings there is{} Systematic '
'Performance Bias.'.format(grouping_name_i, grouping_name_j,
is_not))
Grouping using variable race_encoded as ['x <= 0.1', '0.1 < x <= 0.12', '0.12 < x <= 0.14', '0.14 < x <= 0.26', '0.26 < x']
Resulting confusion matrices:
Confusion matrix 0 :
[[236 15]
[ 10 10]]
Confusion matrix 1 :
[[265 26]
[ 10 10]]
Confusion matrix 2 :
[[2677 251]
[ 60 136]]
Confusion matrix 3 :
[[19411 3541]
[ 1288 3576]]
Confusion matrix 4 :
[[687 131]
[ 76 145]]
The *accuracy* for groups defined on "race_encoded" feature (feature index 8):
* For the population split *x <= 0.1* the accuracy is: 0.91.
* For the population split *0.1 < x <= 0.12* the accuracy is: 0.88.
* For the population split *0.12 < x <= 0.14* the accuracy is: 0.90.
* For the population split *0.14 < x <= 0.26* the accuracy is: 0.83.
* For the population split *0.26 < x* the accuracy is: 0.80.
Systematic performance grid check:
[[False False False False False]
[False False False False False]
[False False False False False]
[False False False False False]
[False False False False False]]
The *accuracy-based* Systematic Performance Bias for *race_encoded* feature (index 8) grouping is:
* For "x <= 0.1" and "0.1 < x <= 0.12" groupings there is NO Systematic Performance Bias.
* For "x <= 0.1" and "0.12 < x <= 0.14" groupings there is NO Systematic Performance Bias.
* For "x <= 0.1" and "0.14 < x <= 0.26" groupings there is NO Systematic Performance Bias.
* For "x <= 0.1" and "0.26 < x" groupings there is NO Systematic Performance Bias.
* For "0.1 < x <= 0.12" and "0.12 < x <= 0.14" groupings there is NO Systematic Performance Bias.
* For "0.1 < x <= 0.12" and "0.14 < x <= 0.26" groupings there is NO Systematic Performance Bias.
* For "0.1 < x <= 0.12" and "0.26 < x" groupings there is NO Systematic Performance Bias.
* For "0.12 < x <= 0.14" and "0.14 < x <= 0.26" groupings there is NO Systematic Performance Bias.
* For "0.12 < x <= 0.14" and "0.26 < x" groupings there is NO Systematic Performance Bias.
* For "0.14 < x <= 0.26" and "0.26 < x" groupings there is NO Systematic Performance Bias.
Sampling bias and systematic performance bias with education¶
In [68]:
print(df['education-num'].max())
print(df['education-num'].min())
print(df['education-num'].unique())
16 1 [13 9 7 14 5 10 12 11 4 16 15 3 6 2 1 8]
In [69]:
# Select a feature for which the Sampling Bias be measured
selected_feature_index = 1
selected_feature_name = df_feature_names[selected_feature_index]
# Define grouping on the selected feature
selected_feature_groups = [1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5,9.5,10.5,11.5,12.5,13.5,14.5,15.5,16.5]
In [70]:
selected_feature_grouping = fatf_data_tools.group_by_column(
df_X,
selected_feature_index,
groupings=selected_feature_groups)
selected_feature_grouping[1]
Out[70]:
['x <= 1.5', '1.5 < x <= 2.5', '2.5 < x <= 3.5', '3.5 < x <= 4.5', '4.5 < x <= 5.5', '5.5 < x <= 6.5', '6.5 < x <= 7.5', '7.5 < x <= 8.5', '8.5 < x <= 9.5', '9.5 < x <= 10.5', '10.5 < x <= 11.5', '11.5 < x <= 12.5', '12.5 < x <= 13.5', '13.5 < x <= 14.5', '14.5 < x <= 15.5', '15.5 < x <= 16.5', '16.5 < x']
In [71]:
for i in range(len(selected_feature_groups)):
print(len(selected_feature_grouping[0][i]))
51 168 333 646 514 933 1175 433 10501 7291 1382 1067 5355 1723 576 413
In [72]:
counts_per_grouping = [len(i) for i in selected_feature_grouping[0]]
print(counts_per_grouping)
print(counts_per_grouping)
fatf_accountability_data.sampling_bias_grid_check(counts_per_grouping)
[51, 168, 333, 646, 514, 933, 1175, 433, 10501, 7291, 1382, 1067, 5355, 1723, 576, 413, 0] [51, 168, 333, 646, 514, 933, 1175, 433, 10501, 7291, 1382, 1067, 5355, 1723, 576, 413, 0]
Out[72]:
array([[False, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True],
[ True, False, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True],
[ True, True, False, True, True, True, True, True, True,
True, True, True, True, True, True, True, True],
[ True, True, True, False, True, True, True, True, True,
True, True, True, True, True, False, True, True],
[ True, True, True, True, False, True, True, False, True,
True, True, True, True, True, False, True, True],
[ True, True, True, True, True, False, True, True, True,
True, True, False, True, True, True, True, True],
[ True, True, True, True, True, True, False, True, True,
True, False, False, True, True, True, True, True],
[ True, True, True, True, False, True, True, False, True,
True, True, True, True, True, True, False, True],
[ True, True, True, True, True, True, True, True, False,
True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True, True, True,
False, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, False, True, True,
True, False, True, True, True, True, True, True],
[ True, True, True, True, True, False, False, True, True,
True, True, False, True, True, True, True, True],
[ True, True, True, True, True, True, True, True, True,
True, True, True, False, True, True, True, True],
[ True, True, True, True, True, True, True, True, True,
True, True, True, True, False, True, True, True],
[ True, True, True, False, False, True, True, True, True,
True, True, True, True, True, False, True, True],
[ True, True, True, True, True, True, True, False, True,
True, True, True, True, True, True, False, True],
[ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, False]])
In [73]:
# Get the disparity grid
bias_grid = fatf_dam.sampling_bias_grid_check(counts_per_grouping)
print(bias_grid)
# Print out disparity per every grouping pair
print('\nThe Sampling Bias for *{}* feature (index {}) grouping is:'
''.format(selected_feature_name, selected_feature_index))
for grouping_i, grouping_name_i in enumerate(counts_per_grouping):
j_offset = grouping_i + 1
for grouping_j, grouping_name_j in enumerate(counts_per_grouping[j_offset:]):
grouping_j += j_offset
is_not = '' if bias_grid[grouping_i, grouping_j] else ' NO'
print(' * For "{}" and "{}" groupings there is{} Sampling Bias.'
''.format(grouping_name_i, grouping_name_j, is_not))
[[False True True True True True True True True True True True
True True True True True]
[ True False True True True True True True True True True True
True True True True True]
[ True True False True True True True True True True True True
True True True True True]
[ True True True False True True True True True True True True
True True False True True]
[ True True True True False True True False True True True True
True True False True True]
[ True True True True True False True True True True True False
True True True True True]
[ True True True True True True False True True True False False
True True True True True]
[ True True True True False True True False True True True True
True True True False True]
[ True True True True True True True True False True True True
True True True True True]
[ True True True True True True True True True False True True
True True True True True]
[ True True True True True True False True True True False True
True True True True True]
[ True True True True True False False True True True True False
True True True True True]
[ True True True True True True True True True True True True
False True True True True]
[ True True True True True True True True True True True True
True False True True True]
[ True True True False False True True True True True True True
True True False True True]
[ True True True True True True True False True True True True
True True True False True]
[ True True True True True True True True True True True True
True True True True False]]
The Sampling Bias for *education-num* feature (index 1) grouping is:
* For "51" and "168" groupings there is Sampling Bias.
* For "51" and "333" groupings there is Sampling Bias.
* For "51" and "646" groupings there is Sampling Bias.
* For "51" and "514" groupings there is Sampling Bias.
* For "51" and "933" groupings there is Sampling Bias.
* For "51" and "1175" groupings there is Sampling Bias.
* For "51" and "433" groupings there is Sampling Bias.
* For "51" and "10501" groupings there is Sampling Bias.
* For "51" and "7291" groupings there is Sampling Bias.
* For "51" and "1382" groupings there is Sampling Bias.
* For "51" and "1067" groupings there is Sampling Bias.
* For "51" and "5355" groupings there is Sampling Bias.
* For "51" and "1723" groupings there is Sampling Bias.
* For "51" and "576" groupings there is Sampling Bias.
* For "51" and "413" groupings there is Sampling Bias.
* For "51" and "0" groupings there is Sampling Bias.
* For "168" and "333" groupings there is Sampling Bias.
* For "168" and "646" groupings there is Sampling Bias.
* For "168" and "514" groupings there is Sampling Bias.
* For "168" and "933" groupings there is Sampling Bias.
* For "168" and "1175" groupings there is Sampling Bias.
* For "168" and "433" groupings there is Sampling Bias.
* For "168" and "10501" groupings there is Sampling Bias.
* For "168" and "7291" groupings there is Sampling Bias.
* For "168" and "1382" groupings there is Sampling Bias.
* For "168" and "1067" groupings there is Sampling Bias.
* For "168" and "5355" groupings there is Sampling Bias.
* For "168" and "1723" groupings there is Sampling Bias.
* For "168" and "576" groupings there is Sampling Bias.
* For "168" and "413" groupings there is Sampling Bias.
* For "168" and "0" groupings there is Sampling Bias.
* For "333" and "646" groupings there is Sampling Bias.
* For "333" and "514" groupings there is Sampling Bias.
* For "333" and "933" groupings there is Sampling Bias.
* For "333" and "1175" groupings there is Sampling Bias.
* For "333" and "433" groupings there is Sampling Bias.
* For "333" and "10501" groupings there is Sampling Bias.
* For "333" and "7291" groupings there is Sampling Bias.
* For "333" and "1382" groupings there is Sampling Bias.
* For "333" and "1067" groupings there is Sampling Bias.
* For "333" and "5355" groupings there is Sampling Bias.
* For "333" and "1723" groupings there is Sampling Bias.
* For "333" and "576" groupings there is Sampling Bias.
* For "333" and "413" groupings there is Sampling Bias.
* For "333" and "0" groupings there is Sampling Bias.
* For "646" and "514" groupings there is Sampling Bias.
* For "646" and "933" groupings there is Sampling Bias.
* For "646" and "1175" groupings there is Sampling Bias.
* For "646" and "433" groupings there is Sampling Bias.
* For "646" and "10501" groupings there is Sampling Bias.
* For "646" and "7291" groupings there is Sampling Bias.
* For "646" and "1382" groupings there is Sampling Bias.
* For "646" and "1067" groupings there is Sampling Bias.
* For "646" and "5355" groupings there is Sampling Bias.
* For "646" and "1723" groupings there is Sampling Bias.
* For "646" and "576" groupings there is NO Sampling Bias.
* For "646" and "413" groupings there is Sampling Bias.
* For "646" and "0" groupings there is Sampling Bias.
* For "514" and "933" groupings there is Sampling Bias.
* For "514" and "1175" groupings there is Sampling Bias.
* For "514" and "433" groupings there is NO Sampling Bias.
* For "514" and "10501" groupings there is Sampling Bias.
* For "514" and "7291" groupings there is Sampling Bias.
* For "514" and "1382" groupings there is Sampling Bias.
* For "514" and "1067" groupings there is Sampling Bias.
* For "514" and "5355" groupings there is Sampling Bias.
* For "514" and "1723" groupings there is Sampling Bias.
* For "514" and "576" groupings there is NO Sampling Bias.
* For "514" and "413" groupings there is Sampling Bias.
* For "514" and "0" groupings there is Sampling Bias.
* For "933" and "1175" groupings there is Sampling Bias.
* For "933" and "433" groupings there is Sampling Bias.
* For "933" and "10501" groupings there is Sampling Bias.
* For "933" and "7291" groupings there is Sampling Bias.
* For "933" and "1382" groupings there is Sampling Bias.
* For "933" and "1067" groupings there is NO Sampling Bias.
* For "933" and "5355" groupings there is Sampling Bias.
* For "933" and "1723" groupings there is Sampling Bias.
* For "933" and "576" groupings there is Sampling Bias.
* For "933" and "413" groupings there is Sampling Bias.
* For "933" and "0" groupings there is Sampling Bias.
* For "1175" and "433" groupings there is Sampling Bias.
* For "1175" and "10501" groupings there is Sampling Bias.
* For "1175" and "7291" groupings there is Sampling Bias.
* For "1175" and "1382" groupings there is NO Sampling Bias.
* For "1175" and "1067" groupings there is NO Sampling Bias.
* For "1175" and "5355" groupings there is Sampling Bias.
* For "1175" and "1723" groupings there is Sampling Bias.
* For "1175" and "576" groupings there is Sampling Bias.
* For "1175" and "413" groupings there is Sampling Bias.
* For "1175" and "0" groupings there is Sampling Bias.
* For "433" and "10501" groupings there is Sampling Bias.
* For "433" and "7291" groupings there is Sampling Bias.
* For "433" and "1382" groupings there is Sampling Bias.
* For "433" and "1067" groupings there is Sampling Bias.
* For "433" and "5355" groupings there is Sampling Bias.
* For "433" and "1723" groupings there is Sampling Bias.
* For "433" and "576" groupings there is Sampling Bias.
* For "433" and "413" groupings there is NO Sampling Bias.
* For "433" and "0" groupings there is Sampling Bias.
* For "10501" and "7291" groupings there is Sampling Bias.
* For "10501" and "1382" groupings there is Sampling Bias.
* For "10501" and "1067" groupings there is Sampling Bias.
* For "10501" and "5355" groupings there is Sampling Bias.
* For "10501" and "1723" groupings there is Sampling Bias.
* For "10501" and "576" groupings there is Sampling Bias.
* For "10501" and "413" groupings there is Sampling Bias.
* For "10501" and "0" groupings there is Sampling Bias.
* For "7291" and "1382" groupings there is Sampling Bias.
* For "7291" and "1067" groupings there is Sampling Bias.
* For "7291" and "5355" groupings there is Sampling Bias.
* For "7291" and "1723" groupings there is Sampling Bias.
* For "7291" and "576" groupings there is Sampling Bias.
* For "7291" and "413" groupings there is Sampling Bias.
* For "7291" and "0" groupings there is Sampling Bias.
* For "1382" and "1067" groupings there is Sampling Bias.
* For "1382" and "5355" groupings there is Sampling Bias.
* For "1382" and "1723" groupings there is Sampling Bias.
* For "1382" and "576" groupings there is Sampling Bias.
* For "1382" and "413" groupings there is Sampling Bias.
* For "1382" and "0" groupings there is Sampling Bias.
* For "1067" and "5355" groupings there is Sampling Bias.
* For "1067" and "1723" groupings there is Sampling Bias.
* For "1067" and "576" groupings there is Sampling Bias.
* For "1067" and "413" groupings there is Sampling Bias.
* For "1067" and "0" groupings there is Sampling Bias.
* For "5355" and "1723" groupings there is Sampling Bias.
* For "5355" and "576" groupings there is Sampling Bias.
* For "5355" and "413" groupings there is Sampling Bias.
* For "5355" and "0" groupings there is Sampling Bias.
* For "1723" and "576" groupings there is Sampling Bias.
* For "1723" and "413" groupings there is Sampling Bias.
* For "1723" and "0" groupings there is Sampling Bias.
* For "576" and "413" groupings there is Sampling Bias.
* For "576" and "0" groupings there is Sampling Bias.
* For "413" and "0" groupings there is Sampling Bias.
In [74]:
dtc5 = tree.DecisionTreeClassifier(max_depth=5)
dtc5.fit(df_X, df_y)
df_pred = dtc5.predict(df_X)
In [75]:
grouping_cm = fatf_metrics_tools.confusion_matrix_per_subgroup_indexed(
selected_feature_grouping[0],
df_y,
df_pred,
labels=np.unique(df_y).tolist())
In [76]:
print('First subgroup')
print('Targets: ', df_y[selected_feature_grouping[0][0]])
print('Unique targets: ', np.unique(df_y[selected_feature_grouping[0][0]]))
print('Unique predictions: ', np.unique(df_pred[selected_feature_grouping[0][0]]))
print('Confusion matrix: ')
print(grouping_cm[0])
First subgroup Targets: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] Unique targets: [0] Unique predictions: [0] Confusion matrix: [[51 0] [ 0 0]]
In [77]:
for i in range(len(grouping_cm)):
print(i,' subgroup')
#print(np.unique(iris_y[selected_feature_grouping[0][1]]))
#print(np.unique(iris_pred[selected_feature_grouping[0][1]]))
print('Confusion matrix: ')
print(grouping_cm[i])
0 subgroup Confusion matrix: [[51 0] [ 0 0]] 1 subgroup Confusion matrix: [[162 6] [ 0 0]] 2 subgroup Confusion matrix: [[317 16] [ 0 0]] 3 subgroup Confusion matrix: [[606 40] [ 0 0]] 4 subgroup Confusion matrix: [[487 27] [ 0 0]] 5 subgroup Confusion matrix: [[871 62] [ 0 0]] 6 subgroup Confusion matrix: [[1115 60] [ 0 0]] 7 subgroup Confusion matrix: [[400 33] [ 0 0]] 8 subgroup Confusion matrix: [[8826 1675] [ 0 0]] 9 subgroup Confusion matrix: [[5503 784] [ 401 603]] 10 subgroup Confusion matrix: [[945 220] [ 76 141]] 11 subgroup Confusion matrix: [[732 138] [ 70 127]] 12 subgroup Confusion matrix: [[2545 592] [ 589 1629]] 13 subgroup Confusion matrix: [[587 226] [177 733]] 14 subgroup Confusion matrix: [[ 85 48] [ 68 375]] 15 subgroup Confusion matrix: [[ 44 37] [ 63 269]] 16 subgroup Confusion matrix: [[0 0] [0 0]]
In [78]:
groups_acc=[]
for i in range(len(grouping_cm)):
group_0_acc = fatf_metrics.accuracy(grouping_cm[i])
print(group_0_acc)
groups_acc.append(group_0_acc)
1.0 0.9642857142857143 0.9519519519519519 0.9380804953560371 0.9474708171206225 0.9335476956055734 0.948936170212766 0.9237875288683602 0.8404913817731645 0.8374708544781237 0.7858176555716353 0.8050609184629803 0.7794584500466853 0.7661056297156124 0.7986111111111112 0.7578692493946732 0
In [79]:
fatf_accountability_models.systematic_performance_bias_grid(groups_acc)
Out[79]:
array([[False, False, False, False, False, False, False, False, False,
False, True, True, True, True, True, True, True],
[False, False, False, False, False, False, False, False, False,
False, True, False, True, True, True, True, True],
[False, False, False, False, False, False, False, False, False,
False, True, False, True, True, False, True, True],
[False, False, False, False, False, False, False, False, False,
False, False, False, True, True, False, True, True],
[False, False, False, False, False, False, False, False, False,
False, True, False, True, True, False, True, True],
[False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, True, True],
[False, False, False, False, False, False, False, False, False,
False, True, False, True, True, False, True, True],
[False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, True, True],
[False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, True],
[False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, True],
[ True, True, True, False, True, False, True, False, False,
False, False, False, False, False, False, False, True],
[ True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, True],
[ True, True, True, True, True, False, True, False, False,
False, False, False, False, False, False, False, True],
[ True, True, True, True, True, True, True, True, False,
False, False, False, False, False, False, False, True],
[ True, True, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, True],
[ True, True, True, True, True, True, True, True, False,
False, False, False, False, False, False, False, True],
[ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, False]])
In [80]:
# Select a predictive performance metric
predictive_performance_metric = 'accuracy'
#predictive_performance_metric = 'positive predictive value' # Notice a class index needs to be selected using "label_index" property
#predictive_performance_metric = 'false negative rate' # Notice a class index needs to be selected using "label_index" property
# Select a feature for which the difference in performance should be measured
selected_feature_index = 1 # before was 2, in FAT-forensics example was 1
selected_feature_name = df_feature_names[selected_feature_index]
# Define grouping on the selected feature
selected_feature_groups = [1.5,2.5,3.5,4.5,5.5,6.5,7.5,8.5,9.5,10.5,11.5,12.5,13.5,14.5,15.5,16.5]
selected_feature_grouping = fatf_data_tools.group_by_column(
df_X,
selected_feature_index,
groupings=selected_feature_groups)
print('Grouping using variable', df_feature_names[selected_feature_index], ' as ', selected_feature_grouping[1])
grouping_cm = fatf_metrics_tools.confusion_matrix_per_subgroup_indexed(
selected_feature_grouping[0],
df_y,
df_pred,
labels=np.unique(df_y).tolist())
print('Resulting confusion matrices:')
for i in range(len(grouping_cm)):
print('Confusion matrix', i, ':')
print(grouping_cm[i])
# Compute performance per group in the feature
population_metrics, population_names = fatf_smt.performance_per_subgroup(
df_X,
df_y,
df_pred,
selected_feature_index,
groupings=selected_feature_groups,
metric=predictive_performance_metric,
label_index = 2)
#print(population_metrics)
# Print out performance per grouping
print('The *{}* for groups defined on "{}" feature (feature index '
'{}):'.format(predictive_performance_metric, selected_feature_name,
selected_feature_index))
for p_name, p_metric in zip(population_names, population_metrics):
print(' * For the population split *{}* the {} is: '
'{:.2f}.'.format(p_name, predictive_performance_metric, p_metric))
# Evaluate Systematic Performance Bias
bias_grid = fatf_accountability_models.systematic_performance_bias_grid(population_metrics)
print('Systematic performance grid check:')
print(bias_grid)
# Print out Systematic Performance Bias for each grouping pair
print('\nThe *{}-based* Systematic Performance Bias for *{}* feature (index '
'{}) grouping is:'.format(predictive_performance_metric,
selected_feature_name, selected_feature_index))
for grouping_i, grouping_name_i in enumerate(population_names):
j_offset = grouping_i + 1
for grouping_j, grouping_name_j in enumerate(population_names[j_offset:]):
grouping_j += j_offset
is_not = '' if bias_grid[grouping_i, grouping_j] else ' NO'
print(' * For "{}" and "{}" groupings there is{} Systematic '
'Performance Bias.'.format(grouping_name_i, grouping_name_j,
is_not))
Grouping using variable education-num as ['x <= 1.5', '1.5 < x <= 2.5', '2.5 < x <= 3.5', '3.5 < x <= 4.5', '4.5 < x <= 5.5', '5.5 < x <= 6.5', '6.5 < x <= 7.5', '7.5 < x <= 8.5', '8.5 < x <= 9.5', '9.5 < x <= 10.5', '10.5 < x <= 11.5', '11.5 < x <= 12.5', '12.5 < x <= 13.5', '13.5 < x <= 14.5', '14.5 < x <= 15.5', '15.5 < x <= 16.5', '16.5 < x']
Resulting confusion matrices:
Confusion matrix 0 :
[[51 0]
[ 0 0]]
Confusion matrix 1 :
[[162 6]
[ 0 0]]
Confusion matrix 2 :
[[317 16]
[ 0 0]]
Confusion matrix 3 :
[[606 40]
[ 0 0]]
Confusion matrix 4 :
[[487 27]
[ 0 0]]
Confusion matrix 5 :
[[871 62]
[ 0 0]]
Confusion matrix 6 :
[[1115 60]
[ 0 0]]
Confusion matrix 7 :
[[400 33]
[ 0 0]]
Confusion matrix 8 :
[[8826 1675]
[ 0 0]]
Confusion matrix 9 :
[[5503 784]
[ 401 603]]
Confusion matrix 10 :
[[945 220]
[ 76 141]]
Confusion matrix 11 :
[[732 138]
[ 70 127]]
Confusion matrix 12 :
[[2545 592]
[ 589 1629]]
Confusion matrix 13 :
[[587 226]
[177 733]]
Confusion matrix 14 :
[[ 85 48]
[ 68 375]]
Confusion matrix 15 :
[[ 44 37]
[ 63 269]]
Confusion matrix 16 :
[[0 0]
[0 0]]
The *accuracy* for groups defined on "education-num" feature (feature index 1):
* For the population split *x <= 1.5* the accuracy is: 1.00.
* For the population split *1.5 < x <= 2.5* the accuracy is: 0.96.
* For the population split *2.5 < x <= 3.5* the accuracy is: 0.95.
* For the population split *3.5 < x <= 4.5* the accuracy is: 0.94.
* For the population split *4.5 < x <= 5.5* the accuracy is: 0.95.
* For the population split *5.5 < x <= 6.5* the accuracy is: 0.93.
* For the population split *6.5 < x <= 7.5* the accuracy is: 0.95.
* For the population split *7.5 < x <= 8.5* the accuracy is: 0.92.
* For the population split *8.5 < x <= 9.5* the accuracy is: 0.84.
* For the population split *9.5 < x <= 10.5* the accuracy is: 0.84.
* For the population split *10.5 < x <= 11.5* the accuracy is: 0.79.
* For the population split *11.5 < x <= 12.5* the accuracy is: 0.81.
* For the population split *12.5 < x <= 13.5* the accuracy is: 0.78.
* For the population split *13.5 < x <= 14.5* the accuracy is: 0.77.
* For the population split *14.5 < x <= 15.5* the accuracy is: 0.80.
* For the population split *15.5 < x <= 16.5* the accuracy is: 0.76.
* For the population split *16.5 < x* the accuracy is: 0.00.
Systematic performance grid check:
[[False False False False False False False False False False True True
True True True True True]
[False False False False False False False False False False True False
True True True True True]
[False False False False False False False False False False True False
True True False True True]
[False False False False False False False False False False False False
True True False True True]
[False False False False False False False False False False True False
True True False True True]
[False False False False False False False False False False False False
False True False True True]
[False False False False False False False False False False True False
True True False True True]
[False False False False False False False False False False False False
False True False True True]
[False False False False False False False False False False False False
False False False False True]
[False False False False False False False False False False False False
False False False False True]
[ True True True False True False True False False False False False
False False False False True]
[ True False False False False False False False False False False False
False False False False True]
[ True True True True True False True False False False False False
False False False False True]
[ True True True True True True True True False False False False
False False False False True]
[ True True False False False False False False False False False False
False False False False True]
[ True True True True True True True True False False False False
False False False False True]
[ True True True True True True True True True True True True
True True True True False]]
The *accuracy-based* Systematic Performance Bias for *education-num* feature (index 1) grouping is:
* For "x <= 1.5" and "1.5 < x <= 2.5" groupings there is NO Systematic Performance Bias.
* For "x <= 1.5" and "2.5 < x <= 3.5" groupings there is NO Systematic Performance Bias.
* For "x <= 1.5" and "3.5 < x <= 4.5" groupings there is NO Systematic Performance Bias.
* For "x <= 1.5" and "4.5 < x <= 5.5" groupings there is NO Systematic Performance Bias.
* For "x <= 1.5" and "5.5 < x <= 6.5" groupings there is NO Systematic Performance Bias.
* For "x <= 1.5" and "6.5 < x <= 7.5" groupings there is NO Systematic Performance Bias.
* For "x <= 1.5" and "7.5 < x <= 8.5" groupings there is NO Systematic Performance Bias.
* For "x <= 1.5" and "8.5 < x <= 9.5" groupings there is NO Systematic Performance Bias.
* For "x <= 1.5" and "9.5 < x <= 10.5" groupings there is NO Systematic Performance Bias.
* For "x <= 1.5" and "10.5 < x <= 11.5" groupings there is Systematic Performance Bias.
* For "x <= 1.5" and "11.5 < x <= 12.5" groupings there is Systematic Performance Bias.
* For "x <= 1.5" and "12.5 < x <= 13.5" groupings there is Systematic Performance Bias.
* For "x <= 1.5" and "13.5 < x <= 14.5" groupings there is Systematic Performance Bias.
* For "x <= 1.5" and "14.5 < x <= 15.5" groupings there is Systematic Performance Bias.
* For "x <= 1.5" and "15.5 < x <= 16.5" groupings there is Systematic Performance Bias.
* For "x <= 1.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "2.5 < x <= 3.5" groupings there is NO Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "3.5 < x <= 4.5" groupings there is NO Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "4.5 < x <= 5.5" groupings there is NO Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "5.5 < x <= 6.5" groupings there is NO Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "6.5 < x <= 7.5" groupings there is NO Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "7.5 < x <= 8.5" groupings there is NO Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "8.5 < x <= 9.5" groupings there is NO Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "9.5 < x <= 10.5" groupings there is NO Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "10.5 < x <= 11.5" groupings there is Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "12.5 < x <= 13.5" groupings there is Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "13.5 < x <= 14.5" groupings there is Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "14.5 < x <= 15.5" groupings there is Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "15.5 < x <= 16.5" groupings there is Systematic Performance Bias.
* For "1.5 < x <= 2.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "3.5 < x <= 4.5" groupings there is NO Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "4.5 < x <= 5.5" groupings there is NO Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "5.5 < x <= 6.5" groupings there is NO Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "6.5 < x <= 7.5" groupings there is NO Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "7.5 < x <= 8.5" groupings there is NO Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "8.5 < x <= 9.5" groupings there is NO Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "9.5 < x <= 10.5" groupings there is NO Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "10.5 < x <= 11.5" groupings there is Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "12.5 < x <= 13.5" groupings there is Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "13.5 < x <= 14.5" groupings there is Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "15.5 < x <= 16.5" groupings there is Systematic Performance Bias.
* For "2.5 < x <= 3.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "4.5 < x <= 5.5" groupings there is NO Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "5.5 < x <= 6.5" groupings there is NO Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "6.5 < x <= 7.5" groupings there is NO Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "7.5 < x <= 8.5" groupings there is NO Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "8.5 < x <= 9.5" groupings there is NO Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "9.5 < x <= 10.5" groupings there is NO Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "10.5 < x <= 11.5" groupings there is NO Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "12.5 < x <= 13.5" groupings there is Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "13.5 < x <= 14.5" groupings there is Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "15.5 < x <= 16.5" groupings there is Systematic Performance Bias.
* For "3.5 < x <= 4.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "5.5 < x <= 6.5" groupings there is NO Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "6.5 < x <= 7.5" groupings there is NO Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "7.5 < x <= 8.5" groupings there is NO Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "8.5 < x <= 9.5" groupings there is NO Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "9.5 < x <= 10.5" groupings there is NO Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "10.5 < x <= 11.5" groupings there is Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "12.5 < x <= 13.5" groupings there is Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "13.5 < x <= 14.5" groupings there is Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "15.5 < x <= 16.5" groupings there is Systematic Performance Bias.
* For "4.5 < x <= 5.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "6.5 < x <= 7.5" groupings there is NO Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "7.5 < x <= 8.5" groupings there is NO Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "8.5 < x <= 9.5" groupings there is NO Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "9.5 < x <= 10.5" groupings there is NO Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "10.5 < x <= 11.5" groupings there is NO Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "12.5 < x <= 13.5" groupings there is NO Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "13.5 < x <= 14.5" groupings there is Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "15.5 < x <= 16.5" groupings there is Systematic Performance Bias.
* For "5.5 < x <= 6.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "7.5 < x <= 8.5" groupings there is NO Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "8.5 < x <= 9.5" groupings there is NO Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "9.5 < x <= 10.5" groupings there is NO Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "10.5 < x <= 11.5" groupings there is Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "12.5 < x <= 13.5" groupings there is Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "13.5 < x <= 14.5" groupings there is Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "15.5 < x <= 16.5" groupings there is Systematic Performance Bias.
* For "6.5 < x <= 7.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "7.5 < x <= 8.5" and "8.5 < x <= 9.5" groupings there is NO Systematic Performance Bias.
* For "7.5 < x <= 8.5" and "9.5 < x <= 10.5" groupings there is NO Systematic Performance Bias.
* For "7.5 < x <= 8.5" and "10.5 < x <= 11.5" groupings there is NO Systematic Performance Bias.
* For "7.5 < x <= 8.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "7.5 < x <= 8.5" and "12.5 < x <= 13.5" groupings there is NO Systematic Performance Bias.
* For "7.5 < x <= 8.5" and "13.5 < x <= 14.5" groupings there is Systematic Performance Bias.
* For "7.5 < x <= 8.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "7.5 < x <= 8.5" and "15.5 < x <= 16.5" groupings there is Systematic Performance Bias.
* For "7.5 < x <= 8.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "8.5 < x <= 9.5" and "9.5 < x <= 10.5" groupings there is NO Systematic Performance Bias.
* For "8.5 < x <= 9.5" and "10.5 < x <= 11.5" groupings there is NO Systematic Performance Bias.
* For "8.5 < x <= 9.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "8.5 < x <= 9.5" and "12.5 < x <= 13.5" groupings there is NO Systematic Performance Bias.
* For "8.5 < x <= 9.5" and "13.5 < x <= 14.5" groupings there is NO Systematic Performance Bias.
* For "8.5 < x <= 9.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "8.5 < x <= 9.5" and "15.5 < x <= 16.5" groupings there is NO Systematic Performance Bias.
* For "8.5 < x <= 9.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "9.5 < x <= 10.5" and "10.5 < x <= 11.5" groupings there is NO Systematic Performance Bias.
* For "9.5 < x <= 10.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "9.5 < x <= 10.5" and "12.5 < x <= 13.5" groupings there is NO Systematic Performance Bias.
* For "9.5 < x <= 10.5" and "13.5 < x <= 14.5" groupings there is NO Systematic Performance Bias.
* For "9.5 < x <= 10.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "9.5 < x <= 10.5" and "15.5 < x <= 16.5" groupings there is NO Systematic Performance Bias.
* For "9.5 < x <= 10.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "10.5 < x <= 11.5" and "11.5 < x <= 12.5" groupings there is NO Systematic Performance Bias.
* For "10.5 < x <= 11.5" and "12.5 < x <= 13.5" groupings there is NO Systematic Performance Bias.
* For "10.5 < x <= 11.5" and "13.5 < x <= 14.5" groupings there is NO Systematic Performance Bias.
* For "10.5 < x <= 11.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "10.5 < x <= 11.5" and "15.5 < x <= 16.5" groupings there is NO Systematic Performance Bias.
* For "10.5 < x <= 11.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "11.5 < x <= 12.5" and "12.5 < x <= 13.5" groupings there is NO Systematic Performance Bias.
* For "11.5 < x <= 12.5" and "13.5 < x <= 14.5" groupings there is NO Systematic Performance Bias.
* For "11.5 < x <= 12.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "11.5 < x <= 12.5" and "15.5 < x <= 16.5" groupings there is NO Systematic Performance Bias.
* For "11.5 < x <= 12.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "12.5 < x <= 13.5" and "13.5 < x <= 14.5" groupings there is NO Systematic Performance Bias.
* For "12.5 < x <= 13.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "12.5 < x <= 13.5" and "15.5 < x <= 16.5" groupings there is NO Systematic Performance Bias.
* For "12.5 < x <= 13.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "13.5 < x <= 14.5" and "14.5 < x <= 15.5" groupings there is NO Systematic Performance Bias.
* For "13.5 < x <= 14.5" and "15.5 < x <= 16.5" groupings there is NO Systematic Performance Bias.
* For "13.5 < x <= 14.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "14.5 < x <= 15.5" and "15.5 < x <= 16.5" groupings there is NO Systematic Performance Bias.
* For "14.5 < x <= 15.5" and "16.5 < x" groupings there is Systematic Performance Bias.
* For "15.5 < x <= 16.5" and "16.5 < x" groupings there is Systematic Performance Bias.
Sampling bias and systematic performance bias with sex¶
In [81]:
print(df['sex_encoded'].max())
print(df['sex_encoded'].min())
print(df['sex_encoded'].unique())
1 0 [1 0]
In [82]:
# Select a feature for which the Sampling Bias be measured
selected_feature_index = 9
selected_feature_name = df_feature_names[selected_feature_index]
# Define grouping on the selected feature
selected_feature_groups = [0.5]
In [83]:
selected_feature_grouping = fatf_data_tools.group_by_column(
df_X,
selected_feature_index,
groupings=selected_feature_groups)
selected_feature_grouping[1]
Out[83]:
['x <= 0.5', '0.5 < x']
In [84]:
print(len(selected_feature_grouping[0][0]))
print(len(selected_feature_grouping[0][1]))
10771 21790
In [85]:
counts_per_grouping = [len(i) for i in selected_feature_grouping[0]]
print(counts_per_grouping)
print(counts_per_grouping)
fatf_accountability_data.sampling_bias_grid_check(counts_per_grouping)
[10771, 21790] [10771, 21790]
Out[85]:
array([[False, True],
[ True, False]])
In [86]:
# Get the disparity grid
bias_grid = fatf_dam.sampling_bias_grid_check(counts_per_grouping)
print(bias_grid)
# Print out disparity per every grouping pair
print('\nThe Sampling Bias for *{}* feature (index {}) grouping is:'
''.format(selected_feature_name, selected_feature_index))
for grouping_i, grouping_name_i in enumerate(counts_per_grouping):
j_offset = grouping_i + 1
for grouping_j, grouping_name_j in enumerate(counts_per_grouping[j_offset:]):
grouping_j += j_offset
is_not = '' if bias_grid[grouping_i, grouping_j] else ' NO'
print(' * For "{}" and "{}" groupings there is{} Sampling Bias.'
''.format(grouping_name_i, grouping_name_j, is_not))
[[False True]
[ True False]]
The Sampling Bias for *sex_encoded* feature (index 9) grouping is:
* For "10771" and "21790" groupings there is Sampling Bias.
In [87]:
dtc5 = tree.DecisionTreeClassifier(max_depth=5)
dtc5.fit(df_X, df_y)
df_pred = dtc5.predict(df_X)
In [88]:
grouping_cm = fatf_metrics_tools.confusion_matrix_per_subgroup_indexed(
selected_feature_grouping[0],
df_y,
df_pred,
labels=np.unique(df_y).tolist())
In [89]:
print('First subgroup')
print('Targets: ', df_y[selected_feature_grouping[0][0]])
print('Unique targets: ', np.unique(df_y[selected_feature_grouping[0][0]]))
print('Unique predictions: ', np.unique(df_pred[selected_feature_grouping[0][0]]))
print('Confusion matrix: ')
print(grouping_cm[0])
First subgroup Targets: [0 0 0 ... 0 0 1] Unique targets: [0 1] Unique predictions: [0 1] Confusion matrix: [[9462 764] [ 130 415]]
In [90]:
print('Second subgroup')
#print(np.unique(iris_y[selected_feature_grouping[0][1]]))
#print(np.unique(iris_pred[selected_feature_grouping[0][1]]))
print('Confusion matrix: ')
print(grouping_cm[1])
Second subgroup Confusion matrix: [[13814 3200] [ 1314 3462]]
In [91]:
group_0_acc = fatf_metrics.accuracy(grouping_cm[0])
print(group_0_acc)
group_1_acc = fatf_metrics.accuracy(grouping_cm[1])
print(group_1_acc)
0.9169993501067681 0.7928407526388251
In [92]:
fatf_accountability_models.systematic_performance_bias_grid([group_0_acc, group_1_acc])
Out[92]:
array([[False, False],
[False, False]])
In [93]:
# Select a predictive performance metric
predictive_performance_metric = 'accuracy'
#predictive_performance_metric = 'positive predictive value' # Notice a class index needs to be selected using "label_index" property
#predictive_performance_metric = 'false negative rate' # Notice a class index needs to be selected using "label_index" property
# Select a feature for which the difference in performance should be measured
selected_feature_index = 9 # before was 2, in FAT-forensics example was 1
selected_feature_name = df_feature_names[selected_feature_index]
# Define grouping on the selected feature
selected_feature_groups = [0.5]
selected_feature_grouping = fatf_data_tools.group_by_column(
df_X,
selected_feature_index,
groupings=selected_feature_groups)
print('Grouping using variable', df_feature_names[selected_feature_index], ' as ', selected_feature_grouping[1])
grouping_cm = fatf_metrics_tools.confusion_matrix_per_subgroup_indexed(
selected_feature_grouping[0],
df_y,
df_pred,
labels=np.unique(df_y).tolist())
print('Resulting confusion matrices:')
for i in range(len(grouping_cm)):
print('Confusion matrix', i, ':')
print(grouping_cm[i])
# Compute performance per group in the feature
population_metrics, population_names = fatf_smt.performance_per_subgroup(
df_X,
df_y,
df_pred,
selected_feature_index,
groupings=selected_feature_groups,
metric=predictive_performance_metric,
label_index = 2)
#print(population_metrics)
# Print out performance per grouping
print('The *{}* for groups defined on "{}" feature (feature index '
'{}):'.format(predictive_performance_metric, selected_feature_name,
selected_feature_index))
for p_name, p_metric in zip(population_names, population_metrics):
print(' * For the population split *{}* the {} is: '
'{:.2f}.'.format(p_name, predictive_performance_metric, p_metric))
# Evaluate Systematic Performance Bias
bias_grid = fatf_accountability_models.systematic_performance_bias_grid(population_metrics)
print('Systematic performance grid check:')
print(bias_grid)
# Print out Systematic Performance Bias for each grouping pair
print('\nThe *{}-based* Systematic Performance Bias for *{}* feature (index '
'{}) grouping is:'.format(predictive_performance_metric,
selected_feature_name, selected_feature_index))
for grouping_i, grouping_name_i in enumerate(population_names):
j_offset = grouping_i + 1
for grouping_j, grouping_name_j in enumerate(population_names[j_offset:]):
grouping_j += j_offset
is_not = '' if bias_grid[grouping_i, grouping_j] else ' NO'
print(' * For "{}" and "{}" groupings there is{} Systematic '
'Performance Bias.'.format(grouping_name_i, grouping_name_j,
is_not))
Grouping using variable sex_encoded as ['x <= 0.5', '0.5 < x']
Resulting confusion matrices:
Confusion matrix 0 :
[[9462 764]
[ 130 415]]
Confusion matrix 1 :
[[13814 3200]
[ 1314 3462]]
The *accuracy* for groups defined on "sex_encoded" feature (feature index 9):
* For the population split *x <= 0.5* the accuracy is: 0.92.
* For the population split *0.5 < x* the accuracy is: 0.79.
Systematic performance grid check:
[[False False]
[False False]]
The *accuracy-based* Systematic Performance Bias for *sex_encoded* feature (index 9) grouping is:
* For "x <= 0.5" and "0.5 < x" groupings there is NO Systematic Performance Bias.
Sampling bias and systematic performance bias with age¶
In [94]:
print(df['age'].max())
print(df['age'].min())
print(df['age'].unique())
90 17 [39 50 38 53 28 37 49 52 31 42 30 23 32 40 34 25 43 54 35 59 56 19 20 45 22 48 21 24 57 44 41 29 18 47 46 36 79 27 67 33 76 17 55 61 70 64 71 68 66 51 58 26 60 90 75 65 77 62 63 80 72 74 69 73 81 78 88 82 83 84 85 86 87]
In [95]:
# Select a feature for which the Sampling Bias be measured
selected_feature_index = 0
selected_feature_name = df_feature_names[selected_feature_index]
# Define grouping on the selected feature
selected_feature_groups = [25,45,75,90]
In [96]:
selected_feature_grouping = fatf_data_tools.group_by_column(
df_X,
selected_feature_index,
groupings=selected_feature_groups)
selected_feature_grouping[1]
Out[96]:
['x <= 25', '25 < x <= 45', '45 < x <= 75', '75 < x <= 90', '90 < x']
In [97]:
print(len(selected_feature_grouping[0][0]))
print(len(selected_feature_grouping[0][1]))
print(len(selected_feature_grouping[0][2]))
print(len(selected_feature_grouping[0][3]))
6411 16523 9386 241
In [98]:
counts_per_grouping = [len(i) for i in selected_feature_grouping[0]]
print(counts_per_grouping)
print(counts_per_grouping)
fatf_accountability_data.sampling_bias_grid_check(counts_per_grouping)
[6411, 16523, 9386, 241, 0] [6411, 16523, 9386, 241, 0]
Out[98]:
array([[False, True, True, True, True],
[ True, False, True, True, True],
[ True, True, False, True, True],
[ True, True, True, False, True],
[ True, True, True, True, False]])
In [99]:
# Get the disparity grid
bias_grid = fatf_dam.sampling_bias_grid_check(counts_per_grouping)
print(bias_grid)
# Print out disparity per every grouping pair
print('\nThe Sampling Bias for *{}* feature (index {}) grouping is:'
''.format(selected_feature_name, selected_feature_index))
for grouping_i, grouping_name_i in enumerate(counts_per_grouping):
j_offset = grouping_i + 1
for grouping_j, grouping_name_j in enumerate(counts_per_grouping[j_offset:]):
grouping_j += j_offset
is_not = '' if bias_grid[grouping_i, grouping_j] else ' NO'
print(' * For "{}" and "{}" groupings there is{} Sampling Bias.'
''.format(grouping_name_i, grouping_name_j, is_not))
[[False True True True True]
[ True False True True True]
[ True True False True True]
[ True True True False True]
[ True True True True False]]
The Sampling Bias for *age* feature (index 0) grouping is:
* For "6411" and "16523" groupings there is Sampling Bias.
* For "6411" and "9386" groupings there is Sampling Bias.
* For "6411" and "241" groupings there is Sampling Bias.
* For "6411" and "0" groupings there is Sampling Bias.
* For "16523" and "9386" groupings there is Sampling Bias.
* For "16523" and "241" groupings there is Sampling Bias.
* For "16523" and "0" groupings there is Sampling Bias.
* For "9386" and "241" groupings there is Sampling Bias.
* For "9386" and "0" groupings there is Sampling Bias.
* For "241" and "0" groupings there is Sampling Bias.
In [100]:
dtc5 = tree.DecisionTreeClassifier(max_depth=5)
dtc5.fit(df_X, df_y)
df_pred = dtc5.predict(df_X)
In [101]:
grouping_cm = fatf_metrics_tools.confusion_matrix_per_subgroup_indexed(
selected_feature_grouping[0],
df_y,
df_pred,
labels=np.unique(df_y).tolist())
In [102]:
print('First subgroup')
print('Targets: ', df_y[selected_feature_grouping[0][0]])
print('Unique targets: ', np.unique(df_y[selected_feature_grouping[0][0]]))
print('Unique predictions: ', np.unique(df_pred[selected_feature_grouping[0][0]]))
print('Confusion matrix: ')
print(grouping_cm[0])
First subgroup Targets: [0 0 0 ... 0 0 0] Unique targets: [0 1] Unique predictions: [0 1] Confusion matrix: [[6293 112] [ 4 2]]
In [103]:
print('Second subgroup')
#print(np.unique(iris_y[selected_feature_grouping[0][1]]))
#print(np.unique(iris_pred[selected_feature_grouping[0][1]]))
print('Confusion matrix: ')
print(grouping_cm[1])
print('Third subgroup')
#print(np.unique(iris_y[selected_feature_grouping[0][2]]))
#print(np.unique(iris_pred[selected_feature_grouping[0][2]]))
print('Confusion matrix: ')
print(grouping_cm[2])
print('Fourth subgroup')
#print(np.unique(iris_y[selected_feature_grouping[0][3]]))
#print(np.unique(iris_pred[selected_feature_grouping[0][3]]))
print('Confusion matrix: ')
print(grouping_cm[3])
Second subgroup Confusion matrix: [[11306 2179] [ 852 2186]] Third subgroup Confusion matrix: [[5491 1650] [ 573 1672]] Fourth subgroup Confusion matrix: [[186 23] [ 15 17]]
In [104]:
group_0_acc = fatf_metrics.accuracy(grouping_cm[0])
print(group_0_acc)
group_1_acc = fatf_metrics.accuracy(grouping_cm[1])
print(group_1_acc)
group_2_acc = fatf_metrics.accuracy(grouping_cm[2])
print(group_2_acc)
group_3_acc = fatf_metrics.accuracy(grouping_cm[3])
print(group_3_acc)
0.9819060988925284 0.8165587363069661 0.7631578947368421 0.8423236514522822
In [105]:
fatf_accountability_models.systematic_performance_bias_grid([group_0_acc, group_1_acc,group_2_acc,group_3_acc])
Out[105]:
array([[False, True, True, False],
[ True, False, False, False],
[ True, False, False, False],
[False, False, False, False]])
In [106]:
# Select a predictive performance metric
predictive_performance_metric = 'accuracy'
#predictive_performance_metric = 'positive predictive value' # Notice a class index needs to be selected using "label_index" property
#predictive_performance_metric = 'false negative rate' # Notice a class index needs to be selected using "label_index" property
# Select a feature for which the difference in performance should be measured
selected_feature_index = 0 # before was 2, in FAT-forensics example was 1
selected_feature_name = df_feature_names[selected_feature_index]
# Define grouping on the selected feature
selected_feature_groups = [25,45,75,90]
selected_feature_grouping = fatf_data_tools.group_by_column(
df_X,
selected_feature_index,
groupings=selected_feature_groups)
print('Grouping using variable', df_feature_names[selected_feature_index], ' as ', selected_feature_grouping[1])
grouping_cm = fatf_metrics_tools.confusion_matrix_per_subgroup_indexed(
selected_feature_grouping[0],
df_y,
df_pred,
labels=np.unique(df_y).tolist())
print('Resulting confusion matrices:')
for i in range(len(grouping_cm)):
print('Confusion matrix', i, ':')
print(grouping_cm[i])
# Compute performance per group in the feature
population_metrics, population_names = fatf_smt.performance_per_subgroup(
df_X,
df_y,
df_pred,
selected_feature_index,
groupings=selected_feature_groups,
metric=predictive_performance_metric,
label_index = 2)
#print(population_metrics)
# Print out performance per grouping
print('The *{}* for groups defined on "{}" feature (feature index '
'{}):'.format(predictive_performance_metric, selected_feature_name,
selected_feature_index))
for p_name, p_metric in zip(population_names, population_metrics):
print(' * For the population split *{}* the {} is: '
'{:.2f}.'.format(p_name, predictive_performance_metric, p_metric))
# Evaluate Systematic Performance Bias
bias_grid = fatf_accountability_models.systematic_performance_bias_grid(population_metrics)
print('Systematic performance grid check:')
print(bias_grid)
# Print out Systematic Performance Bias for each grouping pair
print('\nThe *{}-based* Systematic Performance Bias for *{}* feature (index '
'{}) grouping is:'.format(predictive_performance_metric,
selected_feature_name, selected_feature_index))
for grouping_i, grouping_name_i in enumerate(population_names):
j_offset = grouping_i + 1
for grouping_j, grouping_name_j in enumerate(population_names[j_offset:]):
grouping_j += j_offset
is_not = '' if bias_grid[grouping_i, grouping_j] else ' NO'
print(' * For "{}" and "{}" groupings there is{} Systematic '
'Performance Bias.'.format(grouping_name_i, grouping_name_j,
is_not))
Grouping using variable age as ['x <= 25', '25 < x <= 45', '45 < x <= 75', '75 < x <= 90', '90 < x']
Resulting confusion matrices:
Confusion matrix 0 :
[[6293 112]
[ 4 2]]
Confusion matrix 1 :
[[11306 2179]
[ 852 2186]]
Confusion matrix 2 :
[[5491 1650]
[ 573 1672]]
Confusion matrix 3 :
[[186 23]
[ 15 17]]
Confusion matrix 4 :
[[0 0]
[0 0]]
The *accuracy* for groups defined on "age" feature (feature index 0):
* For the population split *x <= 25* the accuracy is: 0.98.
* For the population split *25 < x <= 45* the accuracy is: 0.82.
* For the population split *45 < x <= 75* the accuracy is: 0.76.
* For the population split *75 < x <= 90* the accuracy is: 0.84.
* For the population split *90 < x* the accuracy is: 0.00.
Systematic performance grid check:
[[False True True False True]
[ True False False False True]
[ True False False False True]
[False False False False True]
[ True True True True False]]
The *accuracy-based* Systematic Performance Bias for *age* feature (index 0) grouping is:
* For "x <= 25" and "25 < x <= 45" groupings there is Systematic Performance Bias.
* For "x <= 25" and "45 < x <= 75" groupings there is Systematic Performance Bias.
* For "x <= 25" and "75 < x <= 90" groupings there is NO Systematic Performance Bias.
* For "x <= 25" and "90 < x" groupings there is Systematic Performance Bias.
* For "25 < x <= 45" and "45 < x <= 75" groupings there is NO Systematic Performance Bias.
* For "25 < x <= 45" and "75 < x <= 90" groupings there is NO Systematic Performance Bias.
* For "25 < x <= 45" and "90 < x" groupings there is Systematic Performance Bias.
* For "45 < x <= 75" and "75 < x <= 90" groupings there is NO Systematic Performance Bias.
* For "45 < x <= 75" and "90 < x" groupings there is Systematic Performance Bias.
* For "75 < x <= 90" and "90 < x" groupings there is Systematic Performance Bias.
In [ ]: